Question 1

The Multivariate Gaussian distribution

The multivariate Gaussian distribution has the form

\begin{align} \mathcal{N}(x; \mu, \Sigma) &= |2\pi \Sigma|^{-1/2} \exp\left( -\frac{1}{2} (x-\mu)^\top \Sigma^{-1} (x-\mu) \right) \\ & = \exp\left(-\frac{1}{2} x^\top \Sigma^{-1} x + \mu^\top \Sigma^{-1} x - \frac{1}{2} \mu^\top \Sigma^{-1} \mu -\frac{1}{2}\log \det(2\pi \Sigma) \right) \\ \end{align}

Gaussian Processes Regression

In Bayesian machine learning, a frequent problem encountered is the regression problem where we are given a pairs of inputs $x_i \in \mathbb{R}^N$ and associated noisy observations $y_i \in \mathbb{R}$. We assume the following model

\begin{eqnarray*} y_i &\sim& {\cal N}(y_i; f(x_i), R) \end{eqnarray*}

The interesting thing about a Gaussian process is that the function $f$ is not specified in close form, but we assume that the function values \begin{eqnarray*} f_i & = & f(x_i) \end{eqnarray*} are jointly Gaussian distributed as \begin{eqnarray*} \left( \begin{array}{c} f_1 \\ \vdots \\ f_L \\ \end{array} \right) & = & f_{1:L} \sim {\cal N}(f_{1:L}; 0, \Sigma(x_{1:L})) \end{eqnarray*} Here, we define the entries of the covariance matrix $\Sigma(x_{1:L})$ as \begin{eqnarray*} \Sigma_{i,j} & = & K(x_i, x_j) \end{eqnarray*} for $i,j \in \{1, \dots, L\}$. Here, $K$ is a given covariance function. Now, if we wish to predict the value of $f$ for a new $x$, we simply form the following joint distribution: \begin{eqnarray*} \left( \begin{array}{c} f_1 \\ f_2 \\ \vdots \\ f_L \\ f \\ \end{array} \right) & \sim & {\cal N}\left( \left(\begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ \end{array}\right) , \left(\begin{array}{cccccc} K(x_1,x_1) & K(x_1,x_2) & \dots & K(x_1, x_L) & K(x_1, x) \\ K(x_2,x_1) & K(x_2,x_2) & \dots & K(x_2, x_L) & K(x_2, x) \\ \vdots &\\ K(x_L,x_1) & K(x_L,x_2) & \dots & K(x_L, x_L) & K(x_L, x) \\ K(x,x_1) & K(x,x_2) & \dots & K(x, x_L) & K(x, x) \\ \end{array}\right) \right) \\ \left( \begin{array}{c} f_{1:L} \\ f \end{array} \right) & \sim & {\cal N}\left( \left(\begin{array}{c} \mathbf{0} \\ 0 \\ \end{array}\right) , \left(\begin{array}{cc} \Sigma(x_{1:L}) & k(x_{1:L}, x) \\ k(x_{1:L}, x)^\top & K(x, x) \\ \end{array}\right) \right) \\ \end{eqnarray*}

Here, $k(x_{1:L}, x)$ is a $L \times 1$ vector with entries $k_i$ where

\begin{eqnarray*} k_i = K(x_i, x) \end{eqnarray*}

Popular choices of covariance functions to generate smooth regression functions include a Bell shaped one \begin{eqnarray*} K_1(x_i, x_j) & = & \exp\left(-\frac{1}2 \| x_i - x_j \|^2 \right) \end{eqnarray*} and a Laplacian \begin{eqnarray*} K_2(x_i, x_j) & = & \exp\left(-\frac{1}2 \| x_i - x_j \| \right) \end{eqnarray*}

where $\| x \| = \sqrt{x^\top x}$ is the Euclidian norm.

Part 1

From your notes, derive the expressions to compute the predictive density \begin{eqnarray*} p(\hat{y}| y_{1:L}, x_{1:L}, \hat{x}) \end{eqnarray*}

\begin{eqnarray*} p(y | y_{1:L}, x_{1:L}, x) &=& {\cal N}(y; m, S) \\ m & = & \\ S & = & \end{eqnarray*}

Part 2

Write a program to compute the mean and covariance of $p(\hat{y}| y_{1:L}, x_{1:L}, \hat{x})$ to generate a for the following data:

 x = [-2 -1 0 3.5 4]
 y = [4.1 0.9 2 12.3 15.8]

Try different covariance functions $K_1$ and $K_2$ and observation noise covariances $R$ and comment on the nature of the approximation.

Part 3

Suppose we are using a covariance function parameterised by \begin{eqnarray*} K_\beta(x_i, x_j) & = & \exp\left(-\frac{1}\beta \| x_i - x_j \|^2 \right) \end{eqnarray*} Find the optimum regularisation parameter $\beta^*(R)$ as a function of observation noise variance via maximisation of the marginal likelihood, i.e. \begin{eqnarray*} \beta^* & = & \arg\max_{\beta} p(y_{1:N}| x_{1:N}, \beta, R) \end{eqnarray*} Generate a plot of $b^*(R)$ for $R = 0.01, 0.02, \dots, 1$ for the dataset given in 2.

Here, remember that \begin{eqnarray*} p(y_{1:N}| x_{1:N}, \beta, R) = \mathcal{N}(y_{1:N}; 0, K(x_{1:N})+R) \end{eqnarray*}

For your reference, we give a basic implementation below:



In [ ]:

    
def cov_fun_bell(x1,x2,delta=1):
    return np.exp(-0.5*np.abs(x1-x2)**2/delta) 

def cov_fun_exp(x1,x2):
    return np.exp(-0.5*np.abs(x1-x2)) 

def cov_fun(x1,x2):
    return cov_fun_bell(x1,x2,delta=1) 

R = 0.05

x = np.array([-2, -1, 0, 3.5, 4]);
y = np.array([4.1, 0.9, 2, 12.3, 15.8]);

Sig = cov_fun(x.reshape((len(x),1)),x.reshape((1,len(x)))) + R*np.eye(len(x))
SigI = np.linalg.inv(Sig)

xx = np.linspace(-5,5,100)
yy = np.zeros_like(xx)
ss = np.zeros_like(xx)
for i in range(len(xx)):
    z = np.r_[x,xx[i]]
    CrossSig = cov_fun(x,xx[i])
    PriorSig = cov_fun(xx[i],xx[i]) + R
    
    yy[i] = np.dot(np.dot(CrossSig, SigI),y)
    ss[i] = PriorSig - np.dot(np.dot(CrossSig, SigI),CrossSig)
    

plt.plot(x,y,'or')
plt.plot(xx,yy,'b.')
plt.plot(xx,yy+3*np.sqrt(ss),'b:')
plt.plot(xx,yy-3*np.sqrt(ss),'b:')
plt.show()

Question 2

Read http://mbmlbook.com/TrueSkill.html and http://mbmlbook.com/TrueSkill_Modelling_the_outcome_of_games.html

Write a program that produces 10,000 samples from a Gaussian with zero mean and a standard deviation of 1. Compute the percentage of these samples which lie between -1 and 1, between -2 and 2 and between -3 and 3. You should find that these percentages are close to those given in the caption of Figure 3.4.
Construct a histogram of the samples created in the previous exercise (like the ones in Figure 3.7) and verify that it resembles a bell-shaped curve.
Compute the mean, standard deviation and variance of your samples, referring to Panel 3.1. The mean should be close to zero and the standard deviation and variance should both be close to 1 (since 1^2=1).
Produce a second set of 10,000 samples from a Gaussian with mean 1 and standard deviation 1. Plot a scatter plot like Figure 3.8 where the X co-ordinate of each point is a sample from the first set and the Y co-ordinate is the corresponding sample from the second set (pairing the first sample from each set, the second sample from each set and so on). Compute the fraction of samples which lie above the diagonal line where X=Y.

Create double variables X and Y with priors of Gaussian(0,1) and Gaussian(1,1) respectively. Define a third random variable Ywins that equals to Y>X. Compute the posterior distribution numerically over Ywins and verify that it is close to the fraction of samples above the diagonal in the previous exercise.